Unified input API: artifact_id + data replaces input_csv by RafaelPo · Pull Request #207 · futuresearch/futuresearch-python

RafaelPo · 2026-02-24T19:47:28Z

Summary

Replace input_csv with a unified artifact_id + data input API across all MCP tools (screen, rank, dedupe, merge, agent, single_agent)
Add uploads.py with HMAC-signed upload URL support for multi-pod deployments
Move input resolution logic to model properties; add Google Sheets /pub URL normalization
Reject empty inline data (data=[]) in validators to prevent zero-row task submissions
Log input source (artifact_id vs data) on task submission

Test plan

All 257 unit tests pass
Input validation: artifact_id and data are mutually exclusive, empty data rejected
Google Sheets URL normalization handles /pub, /edit, and export URLs
HMAC upload URL signing roundtrip, expiry, and tamper detection
Manual: run integration tests against staging API

🤖 Generated with Claude Code

Processing tools now accept artifact_id (UUID from upload_data) or data (list[dict]) instead of input_csv/input_data/input_json. Adds upload_data tool for URL/file ingestion and request_upload_url for presigned large-file uploads in HTTP mode. Phase 1: Simplified _SingleSourceInput and MergeInput models Phase 2: upload_data tool (URL + local path + Google Sheets) Phase 3: Presigned URL upload system (HMAC, Redis metadata, REST endpoint) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Adopt _aid_or_dataframe and _input_data_mode properties on models, removing free functions from tools.py - Add left/right properties to MergeInput - Use `is not None` for artifact_id checks - Fix sync context manager in uploads.py (with → async with) - Remove duplicate max_inline_rows in config.py - Update integration tests to use data instead of removed input_csv - Reject empty CSV in upload_data local file path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- /pub URLs now correctly convert to /export?format=csv - Headers-only CSV from URL raises clear "empty CSV" error instead of misleading "could not parse as CSV or JSON" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove auto-generated per-process secret — it breaks when pods don't share state. Now fails fast with a clear error if unset. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add len(v) == 0 check to _SingleSourceInput and MergeInput validate_data_size validators to prevent wasteful zero-row task submissions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

RafaelPo · 2026-02-24T19:48:41Z

@claude code review

github-actions · 2026-02-24T19:49:00Z

Claude finished @RafaelPo's task in 2m 12s —— View job

Code Review: Unified Input API v2

Read repo guidelines
Read all changed files and understand the diff
Analyze code for bugs, security issues, and improvements
Post review findings

Overall this is a well-structured refactor. The artifact_id + data API is cleaner than the previous input_csv + data (CSV string | JSON) approach, and the upload system is solid. Tests cover the important paths. A few issues worth addressing:

1. SSRF risk in `fetch_csv_from_url` (security)

everyrow-mcp/src/everyrow_mcp/utils.py:63-76

fetch_csv_from_url follows redirects and fetches arbitrary user-supplied URLs with no restrictions on internal/private IPs. In HTTP mode (server-side), this enables SSRF against cloud metadata endpoints (http://169.254.169.254/...), internal services, etc.

Consider adding a deny-list for private IP ranges, or using a library like ssrf-guard to validate resolved addresses before connecting.

Fix this →

2. No response size limit when fetching URLs (resource exhaustion)

everyrow-mcp/src/everyrow_mcp/utils.py:74 / everyrow-mcp/src/everyrow_mcp/tools.py:624-625

The upload endpoint enforces max_upload_size_bytes (50 MB), but fetch_csv_from_url reads the full response into memory with no size cap. A large URL response could cause OOM. Consider streaming the response and enforcing a size limit consistent with max_upload_size_bytes.

Fix this →

3. Stale docstring references `left_csv` / `right_csv` (bug)

everyrow-mcp/src/everyrow_mcp/tools.py:456-457

The everyrow_merge docstring still says:

left_csv = the table being enriched (ALL its rows appear in the output).
right_csv = the lookup/reference table (its columns are appended to matches).

These should reference the new field names (left_artifact_id/left_data and right_artifact_id/right_data).

Fix this →

4. Misleading `upload_secret` description (config)

everyrow-mcp/src/everyrow_mcp/config.py:81

Description says "Auto-generated if empty" but _get_secret() in uploads.py:58 raises RuntimeError when empty. The description should say something like "Required in HTTP mode for HMAC signing."

Fix this →

5. Dead code: `load_data` function (cleanup)

everyrow-mcp/src/everyrow_mcp/utils.py:206-258

load_data is no longer imported in production code (tools.py now uses params._aid_or_dataframe instead). The test comment at test_utils.py:105 says "still used by upload_data" but the upload_data tool uses pd.read_csv / fetch_csv_from_url directly. Consider removing load_data and its tests, or if it's intended as a public utility, update the comment.

6. Internal errors leaked in upload error responses (hardening)

everyrow-mcp/src/everyrow_mcp/uploads.py:204,218

return JSONResponse({"error": f"Could not parse CSV: {exc}"}, status_code=400)
...
return JSONResponse({"error": f"Failed to create artifact: {exc}"}, status_code=500)

Exception messages can leak internal details (stack frames, file paths, DB errors). For the 500 case especially, consider logging the full error and returning a generic message to the client.

7. Minor: `_is_url` naming convention

everyrow-mcp/src/everyrow_mcp/utils.py:14

_is_url uses a private-by-convention leading underscore but is imported across modules (tools.py, models.py). Consider renaming to is_url to signal it's part of the public API for this package.

Summary

The core design is good — the artifact_id + data API is a clean improvement, validation is thorough, and the HMAC upload system is well-structured with one-use tokens via Redis GETDEL. The issues above are focused on SSRF protection (items 1-2) and correctness (items 3-4), which are the most important to address before merge. Items 5-7 are minor cleanup that can be done at any time.

- Fix misleading upload_secret description: say "Required in HTTP mode" instead of "Auto-generated if empty" (it raises RuntimeError) - Remove unused load_data() and its 8 tests — superseded by model properties (_aid_or_dataframe) and upload_data tool - Rename _is_url → is_url since it's imported across modules Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

everyrow-mcp/src/everyrow_mcp/tools.py

            task_type=PublicTaskType.AGENT,
            session_url=session_url,
-            total=len(df),
+            total=total,


When a task is submitted via artifact_id (not inline data), the row count is unknown client-side. The fallback labels now say "artifact" instead of the vague "rows for X" to make clear that the server-side artifact is being processed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sentry · 2026-02-24T20:05:55Z

everyrow-mcp/src/everyrow_mcp/tools.py

+
+    if is_url(params.source):
+        df = await fetch_csv_from_url(params.source)
+    else:


Bug: The everyrow_upload_data function lacks error handling for malformed local CSV files, which will cause an unhandled exception during parsing with pandas.
_{Severity: MEDIUM}

Suggested Fix

Wrap the pd.read_csv(params.source) call within a try-except block. Catch potential exceptions from pandas, such as ParserError, and return a user-friendly error message, similar to the error handling implemented for the REST endpoint in uploads.py.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: everyrow-mcp/src/everyrow_mcp/tools.py#L626 Potential issue: The `everyrow_upload_data` function does not handle potential errors when parsing a local CSV file using `pd.read_csv`. While the existence of the file is checked, the content is not validated. If the file is malformed or not a valid CSV, `pd.read_csv` will raise an unhandled exception, which will propagate up and cause the tool to fail. This is inconsistent with other parts of the application, such as the REST endpoint and the URL-based upload functionality, which both include explicit error handling for CSV parsing.

RafaelPo and others added 6 commits February 24, 2026 19:46

Log input source (artifact_id vs data) on task submission

85d5387

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Require UPLOAD_SECRET env var for multi-pod HMAC signing

4574f06

Remove auto-generated per-process secret — it breaks when pods don't share state. Now fails fast with a clear error if unset. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Reject empty inline data in validators

8c251f5

Add len(v) == 0 check to _SingleSourceInput and MergeInput validate_data_size validators to prevent wasteful zero-row task submissions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This was referenced Feb 24, 2026

Security hardening: SSRF, headers, Redis TLS, container lockdown #208

Merged

Unified input API: artifact_id + data replaces input_csv #199

Closed

RafaelPo requested review from hnykda and rgambee February 24, 2026 19:57

sentry bot reviewed Feb 24, 2026

View reviewed changes

everyrow-mcp/src/everyrow_mcp/tools.py

task_type=PublicTaskType.AGENT,

session_url=session_url,

total=len(df),

total=total,

This comment was marked as outdated.

Sign in to view

RafaelPo merged commit 5fd8d45 into main Feb 24, 2026
5 checks passed

RafaelPo deleted the feat/unified-input-api-v2 branch February 24, 2026 20:05

sentry bot reviewed Feb 24, 2026

View reviewed changes

RafaelPo mentioned this pull request Feb 25, 2026

Fix basedpyright type errors blocking CI deploys #221

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unified input API: artifact_id + data replaces input_csv#207

Unified input API: artifact_id + data replaces input_csv#207
RafaelPo merged 8 commits intomainfrom
feat/unified-input-api-v2

RafaelPo commented Feb 24, 2026

Uh oh!

RafaelPo commented Feb 24, 2026

Uh oh!

github-actions bot commented Feb 24, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

sentry bot Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RafaelPo commented Feb 24, 2026

Summary

Test plan

Uh oh!

RafaelPo commented Feb 24, 2026

Uh oh!

github-actions bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Unified Input API v2

1. SSRF risk in fetch_csv_from_url (security)

2. No response size limit when fetching URLs (resource exhaustion)

3. Stale docstring references left_csv / right_csv (bug)

4. Misleading upload_secret description (config)

5. Dead code: load_data function (cleanup)

6. Internal errors leaked in upload error responses (hardening)

7. Minor: _is_url naming convention

Summary

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

sentry bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Feb 24, 2026 •

edited

Loading

1. SSRF risk in `fetch_csv_from_url` (security)

3. Stale docstring references `left_csv` / `right_csv` (bug)

4. Misleading `upload_secret` description (config)

5. Dead code: `load_data` function (cleanup)

7. Minor: `_is_url` naming convention